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Abstract: Consider a testing problem for the null hypothesis Ho : g ©o. 
The standard frequentist practice is to reject the null hypothesis when the p- 
value is smaller than a threshold value a, usually 0.05. We ask the question how 
many of the null hypotheses a frequentist rejects are actually true. Precisely, 
we look at the Bayesian false discovery rate 5 n = P 9 (9 £ ©o|p — value < a) 
under a proper prior density g{9). This depends on the prior g, the sample 
size n, the threshold value a as well as the choice of the test statistic. We 
show that the Benjamini— Hochberg FDR in fact converges to <5 n almost surely 
under g for any fixed n. For one-sided null hypotheses, we derive a third order 
asymptotic expansion for 8 n in the continuous exponential family when the test 
statistic is the MLE and in the location family when the test statistic is the 
sample median. We also briefly mention the expansion in the uniform family 
when the test statistic is the MLE. The expansions are derived by putting 
together Edgeworth expansions for the CDF, Cornish-Fisher expansions for 
the quantile function and various Taylor expansions. Numerical results show 
that the expansions are very accurate even for a small value of n (e.g., n = 10). 
We make many useful conclusions from these expansions, and specifically that 
the frequentist is not prone to false discoveries except when the prior g is too 
spiky. The results are illustrated by many examples. 



1. Introduction 

In a strikingly interesting short note, Soric raised the question of establishing 
upper bounds on the proportion of fictitious statistical discoveries in a battery of 
independent experiments. Thus, if m null hypotheses are tested independently, of 
which mo happen to be true, but V among these mo are rejected at a significance 
level a, and another S among the false ones are also rejected, Soric essentially 
suggested E(V)/(V + S) as a measure of the false discovery rate in the chain of m 
independent experiments. Benjamini and Hochberg [3] then looked at the question 
in much greater detail and gave a careful discussion for what a correct formulation 
for the false discovery rate of a group of frequentists should be, and provided a 
concrete procedure that actually physically controls the groupwise false discovery 
rate. The problem is simultaneously theoretically attractive, socially relevant, and 
practically important. The practical importance comes from its obvious relation to 
statistical discoveries made in clinical trials, and in modern microarray experiments. 
The continued importance of the problem is reflected in two recent articles, Efron 
0, and Storey [2ll |. who provide serious Bayesian connections and advancements 



1 Department of Statistics, Purdue University, 150 North University Street, West Lafayette, IN 
47907-2067, e-mail: dasguptaOstat.purdue.edu 

2 Department of Statistics, Purdue University, 150 North University Street, West Lafayette, IN 
47907-2067, e-mail: tlzhang9sta t .purdue . edu| 

AMS 2000 subject classifications: primary 62F05; secondary 62F03, 62F15. 

Keywords and phrases: Cornish-Fisher expansions, Edgeworth expansions, exponential fami- 
lies, false discovery rate, location families, MLE, p-value. 



190 



False discovery rates of a frequentist 



191 



in the problem. See also Storey [20|, Storey. Taylor and Siegmund [22J, Storey and 
Tibshirani [23||, Genovese and Wasserman and Finner and Roters Q, among 
many others in this currently active area. 

Around the same time that Soric raised the issue of fictitious frequentist dis- 
coveries made by a mechanical adoption of the use of p- values, a different debate 
was brewing in the foundation literature. Berger and Sellke [2|, in a thought pro- 
voking article, gave analytical foundations to the thesis in Edwards, Lindman and 
Savage [6| that the frequentist practice of rejecting a sharp null at a traditional 
5% level amounts to a rush to judgment against the null hypothesis. By deriving 
lower bounds or exact values for the minimum value of the posterior probability of a 
sharp null hypothesis over a variety of classes of priors, Berger and Sellke [2| argued 
that p- values traditionally regarded as small understate the plausibility of nulls, at 
least in some problems. Casella and Berger gave a collection of theorems that 
show that the discrepancy disappears under broad conditions if the null hypothesis 
is composite one-sided. Since the articles of Berger and Sellke Q and Casella and 
Berger p], there has been an avalanche of activity in the foundation literature on 
the safety of use of p- values in testing problems. See Hall and Sellinger [HI], Sel- 



lke, Bayarri and Berger jl8|, Marden [14j and Schervish [17| for a contemporary 
exposition. 

It is conceptually clear that the frequentist FDR literature and the foundation 
literature were both talking about a similar issue: is the frequentist practice of 
rejecting nulls at traditional p- values an invitation to rampant false discoveries? The 
structural difference was that the FDR literature did not introduce a formal prior 
on the unknown parameters, while the foundation literature did not go into multiple 
testing, as is the case in microarray or other emerging interesting applications. The 
purpose of this article is to marry the two schools together, while giving a new 
rigorous analysis of the interesting question: "how many of the null hypotheses a 
frequentist rejects are actually trues" and the flip side of that question, namely, 
"how many of the null hypotheses a frequentist accepts are actually falses". The 
calculations are completely different from what the previous researchers have done, 
although we then demonstrate that our formulation directly relates to both the 
traditional FDR calculations, and the foundational effort in Berger and Sellke 
and others. We have thus a dual goal; providing a new approach, and integrating 
it with the two existing approaches. 

In Section 2, we demonstrate the connection in very great generality, without 
practically any structural assumptions at all. This was comforting. As regards to 
concrete results, it seems appropriate to look at the one parameter exponential 
family, it being the first structured case one would want to investigate. In Section 3, 
we do so, using the MLE as the test statistic. In Section 4, we look at a general 
location parameter, but using the median as the test statistic. We used the median 
for two reasons. First, for general location parameters, the median is credible as a 
test statistic, while the mean obviously is not. Second, it is important to investigate 
the extent to which the answers depend on the choice of the test statistic; by 
studying the median, we get an opportunity to compare the answers for the mean 
and the median in the special normal case. To be specific, let us consider the one 
sided testing problem based on an i.i.d. sample Xi,...,X n from a distribution 
family with parameter 9 in the parameter space f2 which is an interval of R. Without 
loss of generality, we assume f2 = (9, 9) with — oo < 9_ < 9 < <x>. We consider the 
testing problem 

H :9<9 vsH 1 :9> 9 , 
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where 9 € (6, 9). Suppose the a, < a < 1, level test rejects H if T n £ C, where 
T n is a test statistic. We study the behavior of the quantities, 

5 n = P{0 < 6 \T n G C) = P{H \p - value < a) 

and 

e„ = P(9 > 9 \T n $C) = P{H x \p - value > a). 

Note that 5 n and e n are inherently Bayesian quantities. By an almost egregious 
abuse of nomenclature, we will refer to S n and e„ as type I and type II errors in this 
article. Our principal objective is to obtain third order asymptotic expansions for 
5 n and e n assuming a Bayesian proper prior for 9. Suppose g{9) is any sufficiently 
smooth proper prior density of 9. In the regular case, the expansion for S n we obtain 
is like 

m a P{0<0v-T n ^C) d c 2 c 3 

(1) Sn= P(T n eC) =V^ + n + ^ + ° (n } ' 

and the expansion for e„ is like 

where the coefficients 01,02,03, dj, d2; & n d d3 depend on the problem, the test 
statistic T„, the value of a and the prior density g{9). In the nonregular case, 
the expansion differs qualitatively; for both <5„ and e n the successive terms are in 
powers of 1 jn instead of the powers of 1/ y/n. Our ability to derive a third order 
expansion results in a surprisingly accurate expansion, sometimes for n as small as 
n = 4. The asymptotic expansions we derive are not just of theoretical interest; the 
expansions let us conclude interesting things, as in Sections 3.2 and 4.5, that would 
be impossible to conclude from the exact expressions for 5 n and e n . 

The expansions of 6 n and e„ require the expansions of the numerators and the 
denominators of (TT]) and ([2]) respectively. In the regular case, the expansion of the 
numerator of (fTJ) is like 

(3) A n = P (9<9 a ,T n eC) = ^= + ^ + ^- + 0(n- 2 ) 



and the expansion of the numerator of @ is like 



(4) A n = P{9 > 6 ,T n # C) = ^= + ^ + + 0(n- 2 ). 
Then, the expansion of the denominator of fl} is 

(5) B n = P(T n e C) = A n + A - A n = A - A= - - - -g- + 0(n~ 2 ), 

where A = P(9 > 60) = Jg g(9)d9 and assume < A < 1, b\ = ai — 01, 62 =02—02 
and 63 = a 3 — 03, and the expansion of the denominator of @ is 

(6) B n = P(T n $ C) = 1 - B n = 1 - A + -±= + ^ + -go + 0(n- 2 ). 
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Then, we have 
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We will frequently use the three notations in the expansions: the standard normal 
PDF 4> 1 the standard normal CDF $ and the standard normal upper a quantile 
z a = - a). 

The principal ingredients of our calculations are Edgeworth expansions, Cornish- 
Fisher expansions and Taylor expansions. The derivation of the expansions became 
very complex. But in the end, we learn a number of interesting things. We learn that 
typically the false discovery rate S n is small, and smaller than the pre-experimental 
claim a for quite small n. We learn that typically e„ > 8 n , so that the frequentist is 
less vulnerable to false discovery than to false acceptance. We learn that only pri- 
ors very spiky at the boundary between Hq and Hi can cause large false discovery 
rates. We also learn that these phenomena do not really change if the test statistic 
is changed. So while the article is technically complex and the calculations are long, 
the consequences are rewarding. The analogous expansions are qualitatively differ- 
ent in the nonregular case. We could not report them here due to shortage of space. 
We should also add that we leave open the question of establishing these expan- 
sions for problems with nuisance parameters, multivariate problems, and dependent 
data. Results similar to ours are expected in such problems. 

2. Connection to Benjamini and Hochberg, Storey and Efron's work 

Suppose there are m groups of iid samples Xu, . . . , X% n for i = 1, . . . , m. Assume 
Xu, . . . , Xi n are iid with a common density fix, 8i), where 9i are assumed iid with 
a CDF G{9) which does not need to have a density in this section. Then, the prior 
G(9) connects our Bayesian false discovery rate 5 n to the usual frequentist false 
discovery rate. In the context of our hypothesis testing problem, the frequentist 
false discovery rate, which has been recently discussed by Benjamini and Hochberg 
0, Efron and Storey [HI, is defined as 



where T„i is the test statistic based on the samples Xu, . . . , Xi n . It will be shown 
below that for any fixed n as m — > oo, the frequentist false discovery rate FDR goes 
to the Bayesian false discovery rate 5 n almost surely under the prior distribution 



We will compare the numerators and the denominators of FDR in © and 6 n 
in (fT]) respectively. Since the comparisons are almost identical, we discuss the com- 
parison between the numerators only. We denote Eg(-) and Vg(-) as the conditional 
mean and variance given the true parameter 9, and we denote E(-) and V(-) as 
the marginal mean and variance under the prior G(9). Let Yi = lT ni <£C,9i<9 - Then 
given 9±, ... , 9 m , Yi (i = 1, , . . . , m) are independent Bernoulli random variables 
with mean values Hi = Hi(9i) — Eg i {Yi), and marginally [ii are iid with expected 
value A n in ©. Let 



(8) 




G{9). 
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Note that we assume that 6\, . . . , 9 m are iid with a common CDF G(9). The sec- 
ond term goes to almost surely by the Strong Law of Large Numbers (SLLN) 
for identically distributed random variables. Note that for any given 9%, . . . ,6 m , 
Yi, . . . , Y m are independent but not iid, with Eg i (Yi) — /Xj, Vg i (Yi) — — fj-i) and 
i~ 2 Ve i (Yi) < Y^Li < 00 • The first term also goes to almost surely by a 
SLLN for independent but not iid random variables [la]. Therefore, D m goes to 
almost surely. The comparison of denominators is handled similarly. Therefore, for 
almost all sequences 9%, 62, ■ ■ ■ , 

(E^^ec)Vl 

as m — > 00. 

Since J2iLi lT ni eC,6i<6 < (EZLi ^T ni <c) V 1, their ratio is uniformly integrable. 
And so, FDR as defined in ([8} also converges to 8 n as m — ► 00 for almost all 
sequences #i, 62, 

This gives a pleasant, exact connection between our approach and the estab- 
lished indices formulated by the previous researchers. Of course, for fixed m, the 
frequentist FDR does not need to be close to our S n . 



3. Continuous one-parameter exponential family 

Assume the density of the i.i.d. sample X\, . . . , X n is in the form of a one-parameter 
exponential family fg{x) = b(x)e 9x ~ a ^ for x G X C R, where the natural space 
O of 9 is an interval of R and a{9) = log J x b(x)e 6x dx. Without loss of generality, 
we can assume D, is open so that one can write fi = (9,9) for — 00 < 9 < 9 < 
00. All derivatives of a{9) exist at every 9 G VI and can be derived by formally 
differentiating under the integral sign ([3, p. 34). This implies that a' (9) = E e (X 1 ) , 
a"(9) = Var 6 {X{) for every 9 G O. Let us denote n{6) = a'{9), a{9) = yja"{9), 
Kl (9) = a® (6) and Pl {9) = n l {9)/a l {9) for i > 3, where a^(9) represents the i-th 
derivative of a{9). Then, fi(9), cr(9), Hi{9) and pi{9) all exist and are continuous at 
every 9 G O, ([4|, p. 36), and fi(9) is non-decreasing in 9 since a" (9) = a 2 (9) > for 
slid. 

Let no = /i(6*o), <7o = c(0o)) fi^o = tti(#o) and pio = p;(#o) for i > 3 and assume 
o- > . The usual a (0 < a < 1) level UMP test f[l5j. p. 80) for the testing 
problem H : 9 < 9 vs Ha ■ 9 > 9 rejects ffo if A G C where 

(9) C = {A : ^Ti^—^- > kg . n }, 

(TO 



n — >oo K9Q,n — Z(x- 



and kg 0t „ is determined from Pg a {y/n(X — fj<o)/&o > kg Qin } = a; lim 
Let 

(10) n (6) = Pe {^—^ > kg , n 

Then, using the transformation x = ao^/n(9 — 9q) — z a under the integral sign 
below, we have 

(11) A n = f° $ n (9)g(9)d9 = — L= / ^ p n (9 + ^^)g(9 Q + ^±^)dx 



< J oV n Jx <?oV n coV n 
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and 

(12) A n = -L= f [1 _ p n{0Q + ^±p)] g (e + ^^)dx, 

voV n J-z a cr V« &oV n 

where x = <tq^/ti(0 — 9q) — z a and x — <ro^/n(9 — 9q) — z a . 

Since for an interior parameter 9 all moments of the exponential family exist and 
are continuous in 9, we can find 9\ and 9 2 satisfying 9 < 9\ < 9$ and 9q < 9 2 < 9 
such that for any 9 S [0i,0 2 ], a 2 (9), k 3 (0), k 4 (9), k 5 (9), g(9), g'(6), g"{9) and 
g^(9) are uniformly bounded in absolute values, and the minimum value of cr 2 {9) 
is a positive number. After we pick 9\ and 92, we partition each of A n and A n into 
two parts so that one part is negligible in the expansion. Then, the rest of the work 
in the expansion is to find the coefficients of the second part. 

To describe these partitions, we define 9\ n = 9q + (9\ — 9 )/n 1 / 3 , Qm = Oq + 
(0 2 - 9 a )/n 1/3 , xi n = <7oVn{9i n - 6» ) - z a and x 2n = cr ^/n(9 2n - 9 ) - z a . Let 

(13) A nfiln = -±= ( M9 + ^-^)g(9 + X -±^)dx 



ffov" Jx ln PoV n (Tovn 

(14) R n . 6ln = f3 n (6 + ^^)9(0o + ^^)dx, 



(15) A n ,e 2n = -±= f 2 " [1 - Pn(9 + ^^)}g(0o + °^^)dx, 
and 

(16) R n ,g 2n = — ^= f [1 - p n (9 + ^±^)]g(9 + ^^)dx. 

Then, A n = A n g ln +R n g ln and A n — A n ^g 2n +R n .g 2n ■ I n l ne appendix, we show that 
for any t > 0, lim„^oo n l R n e± = Um n ^oo n l R n: g 2n = 0. Therefore, it is enough to 
compute the coefficients of the expansions for A r ^g ln and A n ^ 2n . Among the steps 
for expansions, the key step is to compute the expansions of /3 n (9o+(x+z a ) / (aoy/n)) 
when x € [xi n , —z a ] and 1 — /3„(6>o + (x + z a ) / (ao\/n)) when x G [— z a , x 2n ] under 
the integral sign, since the expansion of g(9o + (x + z a ) / (ao^/n)) in (fT"3"|) and (fl~5)) 
is easily obtained as 

/T7\ la i x Za \ fa \ i i (a \ x Za i 9 (^o) i x + z a) - 2 \ 

(17) g(6» H = g{9 ) +g (0 O ) -= + — g h 0(n ). 



After a lengthy calculation, we have 
(18) 



croy/n J xin y/n n 



r , n N „„ ,I + Z a Q"(8o) (% + z a) 2 i , ^, _?x 

xfefib+ff'fib ^ + ^-^^ ^dx + O n 2 ), 



and 



, / r , <Tl , . i>(x)9i(x) (j>{x)g 2 (x)^ 

A n ,e 2n = 7= / [1 - $0*0 - 



(19) 



X \9{0o) +9 ipa) 7= + — o 5^ da; + 0(n . 
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where 

(20) Si(z) = ^* 2 + ^f^+% 

and 

2 2 i o 2 2 'r 2 32 

g 2 (x) = —a; 5 - ZaPso x 4 + ( — - aP30 _ P30 } x 3 _|_ (5 q/,4 ° Z «^30 



72 12 v 8 72 24 ' v 6 



(21) _ S)^ + [{ tf J-) p40 - 



12 ' 4 24 /r 18 72 



•r 



The expressions for gi(x) and 32 (^) are derived in the Appendix; the derivation 
of these two formulae forms the dominant part of the penultimate expression and 
involves the use of Cornish-Fisher as well as Edgeworth expansions. 
On using (fT8|) . (fl"9|) . (|20|) and (j2Tj) . we have the following expansions 



(22) +<>(„-»), 



where 



.9(^0) r x 1 



a 2 = — a + 2az a - 2z a cp(z a ) -—5- a(;2 Q + 1) - z a (p(z a ) 

6cr 2ct5 

(23) a 3 = 3 [{z a + 2)4>(z a ) - a(z° a + 3z a )\ H — — — (z% + 2z a ) 

6crX 



— (,„ + 1)0(Z Q )] + — [( — gg- + -g- + 35 



Similarly, 



(24) I n>e2 „ = ^ + ^ + ^ + 0(0, 



where 5i = [ff(0 o )/o-(>M*«) + (1 - a)z a ], a 2 = [g'(0 o )/(2<7 o 2 )][(l - a)(z^ + 1) + 
" [P3Ofl(0o)/(6t7 O )][(l - + 2^) + 2* a 0(* a )], 5 3 = [g"(# )/(6a 3 )] {{zl + 
2)4>{z a ) + (1 - a)(z 3 + 3 z Q )] - [g'(e )p 30 /(3^)}[(zl + l)<j>{z a ) + (1 - a)(z 3 + 
2^)] + [ 5 (^)/^o][0(^)(-4P3o/36+44p2 o / 9+p 2 o/ 3 6 _ 5 ^ 4o/24+p4o/24) _ (1 _ 

5z^pl /18 — ll2; Q p| /36 + z 3 p4o/8 + z a p4o/&)]- The details of the expansions 
for A n ^ ln and A n .o 2n are given in the Appendix. Because the remainders R n s 
and R n ,o 2n are of smaller order than n~ 2 as we commented before, the expansions 
in (|22]) and (|24| are the expansions for A n and A n in ([3]) and (|4|) respectively. 

The expansions of S n and e n in (TTJ) and can now be obtained by letting 
A = Jg° g(6)d9, b\ = a± — a\, b 2 = a 2 — a 2 and 63 = 03 — a 3 in ([7]). 
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3. 1 . Examples 

Example 1. Let Xi, .. . , X n be i.i.d. N(9, 1). Since 9 is a location parameter, there 
is no loss of generality in letting 9q = 0. Thus consider testing Ho : 9 < vs Hi : 
9 > 0. Clearly, we have p(6) = 9, a(9) = 1 and Pl (9) = k,(9) = for all i > 3. 

The a (0 < a < 1) level UMP test rejects Hq if y/nX > z a . For a continuously 
three times diffcrentiable prior g(9) for 9, one can simply plug the values of /io = 0, 
(To = 1, P30 = Pm = into (|23|) and the coefficients of the expansion in (|24|) to 
get the coefficients a\ = g(Q)[Q(z a ) — az a ], a 2 = — .g'(0)[a(z 2 — 1) — z a (j)(z a )], 
a 3 = g"{0)[{z a + 2)<f>{z a )-a{z*+3z a )]/6, 5i = g(0)[4>{z a ) + 4>{z a )}, ~a 2 = 5'(0)[(1- 
o)(*2 +l)+2a0(*a)]/2, a 3 = g"(0) [(z Q + 2)0(z J + (1 - a) (^ + 3z Q )]/6. Substituting 
fflij 02, 03, 5i 3 «2 and 03 into (JT]), one derives the expansions of 5 n and e n as given 
by (HJ) and (|SJ) respectively. 

If the prior density function is also assumed to be symmetric, then A = 1/2 
and g' (0) = 0. In this case, the coefficients of the expansion of 5 n in |T]) are given 
explicitly as follows: c\ = 2g(0)[(p(z a ) ~ az a ], C2 — 4,z a [g(0)] 2 [(f>(z a ) — az a ], C3 = 
2<t>{z a ){Azl[g{Q)f + g"{0)(zl + 2)/6} - a {g"{0){zl + 3z Q )/3 + Sz 3 a [g(0)] 3 }, and the 
coefficients of the expansions of e„ in ([2]) are as d\ = 2g(0)[(l — a)z a + (j)(z a )], 
d 2 = -4z Q [ 5 (0)] 2 [(l - a)z a + 4>{z a )l d 3 = 24>{z a ){Azl [g(0)f + g"(0)(z 2 a + 2)/6} + 
(1 - a){ ff "(0)(z3 + 3z Q )/3 + 84[ 5 (0)] 3 }- 

Two specific prior distributions for 9 are now considered for numerical illustra- 
tion. In the first one we choose 9 ~ N(Q, r 2 ) and in the second example we choose 
6/t ~ t m , where r is a scale parameter. Clearly g^ 3 \9) is continuous in 9 in both 
cases. 

If g{9) is the density of 9 when 9 - N(0,t 2 ), then A = 1/2, g(Q) = l/[v^7rr], 
g'(0) = and g"(0) = -1/[V2^t 3 }. 

We calculated the numerical values of ci, C2, C3, di, c/2 and d% as functions of a 
when 6* ~ iV(0, 1). We note that c% is a monotone increasing function and d\ is also 
a monotone decreasing function of a. However, c 2 , d 2 and C3, d% are not monotone 
and in fact, d 2 is decreasing when a is close to 1 (not shown), c 3 also takes negative 
values and d 3 takes positive values for larger values of a. 

If g(9) is the density of 9 when 9/t - t m , then A = 1/2, g'(0) = 0, g(0) = 
r(2^)/[T0n7rT(f )] and ff "(0) = ^(^j/jrVmH^C^)]. Putting those values 
into the corresponding expressions, we get the coefficients ci, C2, C3 and di,d 2 , d 3 of 
the expansions of 8 n and e„. When m = 1, the results are exactly the same as the 
Cauchy prior for 6>. 

Numerical results very similar to the normal prior are seen for the Cauchy case. 
From Figure [TJ wc sec that for each of the normal and the Cauchy prior, only about 
1% of those null hypotheses a frequentist rejects with a p-value of less than 5% are 
true. Indeed quite typically, S n < a for even very small values of n. This is discussed 
in greater detail in Section 4.5. This finding seems to be quite interesting. 

The true values of 5 n and e„ are computed by taking an average of the lower 
and the upper Ricmann sums in A n , A n , B n and B n with the exact formulae for 
the standard normal pdf. The accuracy of the expansion for 5 n is remarkable, as 
can be seen in Figure [TJ Even for n = 4, the true value of 5 n is almost identical to 
the expansion in (UJ) . The accuracy of the expansion for e n is very good (even if it 
is not as good as that for 8 n ). For n = 20, the true value of e n is almost identical 
to the expansion in 

Example 2. Let Xi, ■ ■ ■ ,X n be iid Exp(6), with density f$(x) = 9e~ 9x if x > 0. 
Clearly, p,(0) = 1/9, a 2 {9) = l/6» 2 , p 3 (6) = 2 and p A {9) = 6. Let 9 = -9. Then, 
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n=1 n=2 




0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 




0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 



Fig 1 . True and estimated values of S n as functions of a for the standard normal prior and the 
Cauchy prior. 



one can write the density of X\ in the standard form of the exponential family 
as fz(x) = e 8x+log ' e ' . The natural parameter space of 9 is Q = (— oo, 0). If g(9) 
is a prior density for 9 on (0, oo), then g(—9) is a prior density for 9 on (— oo, 0). 
Since 9 is a scale parameter, it is enough to look at the case 9 = — 1. In terms 
of 9, therefore the problem considered is to test H n : 9 > 1 vs Hi : 9 < 1. The a 
(0 < a < 1) level UMP test for this problem rejects Hq if X > T a ^ n:Tl , where T a rjs 
is the upper a quantile of the Gamma distribution with parameters r and s. If 
g{9) is continuous and three time differentiable, then we can simply put the values 
/io = 1, 0o = Ij P30 = 2, P40 = 6, and A = J Q g(9)d9 into (|23p and the coefficients 
of the expansion in (|24p to get the coefficients ax, 02, 03, ai, &2 and S3, and then 
get the expansions of 8 n and e„ in (fTJ) and ([2]) respectively. 

Two priors are to be considered in this example. The first one is the Gamma prior 
with prior density g(9) = s r 9 r ~ 1 e~ se /T(r), where r and s are known constants. It 
would be natural to have the mode of g{9) at 1, that is s = r — 1. In this case, 
g'(l) = 0, 5(1) = (r - l) r e-('- 1 )/r(r) and g"(l) = -(r - l) r+1 e -('- 1 )/r(r). 

Next, consider the F prior with degrees of freedom 2r and 2s for 6/t for a fixed 
r > 0. Then, the prior density for 9 is g{9) = ijg^ J-(r£)-i(l + ^)~( r + s \ To 
make the mode of g(9) equal to 1, we have to choose r = r(s + l)/[s(r — 1)]. Then 
ff'(l) = 0, .9(1) = r^fg)(^) r (l + i^)- (r+s) ) «/'(!) = _^ y (^)H-i (r+ 

*)(i + i+Tr (r+s+2) - 

Exact and estimated values of S n are plotted in Figure [3j At n = 20, the ex- 
pansion is clearly extremely accurate and as in example 1, we see that the false 
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FlG 2. True and estimated values of e n as functions of a for the standard normal prior and the 
Cauchy prior. 
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Fig 3. True and estimated values of S n as functions of a under T(2, 1) and F(4, 4) priors for i 
when X ~ Exp(O). 
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discovery rate 5 n is very small even for n = 10. 

3.2. The frequentist is more prone to type II error 

Consider the two Bayesian error rates 

5 n = P{H \ Frequentist rejects H ) 

and 

e n = P(Hi\ Frequentist accepts H ). 

Is there an inequality between S n and e n ? Rather interestingly, when 9 is the normal 
mean and the testing problem is Hq : 9 < versus Hi : 6 > 0, there is an 
approximate inequality in the sense that if we consider the respective coefficients 
ci and d\ of the 1/s/n term, then for any symmetric prior (because then </(0) = 
and A = 1 — A = 1/2), we have 

ci = 2g(Q)[4>(z a ) - az a ] < di = 2.g(0)[(l - a)z a + cj)(z a )} 

for any a < 1/2. It is interesting that this inequality holds regardless of the exact 
choice of <?(•) and the value of a, as long as a < 1/2. Thus, to the first order, the 
frequentist is less prone to type I error. Even the exact values of S n and e„ satisfy 
this inequality, unless a is small, as can be seen, for example from a scrutiny of 
Figures [1] and [51 This would suggest that a frequentist needs to be more mindful of 
premature acceptance of H Q rather than its premature rejection in the composite 
one sided problem. This is in contrast to the conclusion reached in Berger and Sellke 
Q under their formulation. 

4. General location parameter case 

As we mentioned in Section 1, the quantities 5 n ,e n depend on the choice of the 
test statistic. For location parameter problems, in general there is no reason to use 
the sample mean as the test statistic. For many non-normal location parameter 
densities, such as the double exponential, it is more natural to use the sample 
median as the test statistic. 

Assume the density of the i.i.d. sample X\, . . . , X n is f(x — 9) where the median 
of /(•) is 0, and assume /(0) > 0. Then an asymptotic size a test for 

Ho : 9 < vs H x : 9 > 

rejects H if \fnT n > z Q /[2/(0)], where T n — Xna-i+u is the sample median ([8|, 

p. 89), since y/n(T n -9) 4 iV(0, l/[4/ 2 (0)]). We will derive the coefficients Ci,c 2 ,c 3 
in (JTJ) and di, cfe, ^3 in ([2|) given the prior density g(ff) for 9. We assume again that 
g{9) is three times differentiable with a bounded absolute third derivative. 

4-1. Expansion of type I error and type II error 

To obtain the coefficients of the expansions of 5 n in ([1]) and e n in |J5J| , we have to 
expand the A n and A n given by ((3]) and ((4|) . Of these, 

(25) A n = P{9 < 0, v^T„ > -|2_) = -L f {1-F n [z a - 2xf(0)]}g(^)dx 
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where F n is the CDF of 2/(0)y / n(T Il — 9) if the true median is 9. Reiss [l6| gives 
the expansion of F n as 

(26) F n (t) = $(*) + ^Ri(t) + ^R 2 (t) + r t , n , 

where, with {a:} denoting the fractional part of a real x, R\(t) — fut + /12, /n = 
/'(0)/[4/ 2 (0)], / 12 = -(1 - 2{f }), and ifc(t) = / 21 t 5 + / 22 f 3 + / 23 i, where / 21 = 
-[/'(0)// 2 (0)] 2 /32, / 22 = l/4+(l/2-{§ })[/'(0)/(2/ 2 (0))]+/"(0)/[24/ 3 (0)], / 23 = 
l/4-(l-2{§ }) 2 /2. The error term r M> „ can be written as r t , n = (j){t)R 3 (t)/n 3 / 2 + 
0(n~ 2 ), where i?3 (i) is a polynomial. 

By letting y = 2xf(0) — z a in (|25|) . we have 



•1<. = T777^ / {'I'l.'/l-^iZlC-l/J-^i^C-I/J-r-v.n} 



(27) 



2/(0)VW-co ' ' n 

X [5(0) + 5 (0) 27(0)7^ + 2 4/'(0)n + 48/3(0)n3/ 2 g (y 



where y* is between and [y + z a )/[2f(0)y/n\. 

Hence, assuming sup e \g^(9)\ < 00, on exact integration of each product of 
functions in (|27[) and on collapsing the terms, we get 

(28) A n = ^ + ^ + ^- + 0(n-% 
where 

(29) ai = A^[4>(z a ) - az a ], 



(30) a 2 = 8 j2( Q ) ~ Oi{z 2 a + 1)] - JL^—{f n [ Za <f>(z a ) + a] + f 12 a} 



and 



g"(0) 
48/ 3 (0) 

4/ 2 (0) 
ff(0) 



( 31 ) ~ Af2(n\ {fll\. aZ " ~ 2< />( z «)] + fl2[az a - <j){z a ))} 



2/(0) 



{/ 21 [(z* + 4^ + 8)</>(z a )] + f 22 [(z 2 a + 2)0(z a )] + / 23 0(^)}. 



We claim the error term in l|28p is 0(n 2 ). To prove this, we need to look at its 
exact form, namely, 



(i 



<H " ' ' 1 e/>(y)R 3 (-y)g(9 + JtL Za r- )*V + 0(n~ 2 ) I g(9 + y)dy. 



2/(0) ./_«, ^' ° v "'^ V(0)V^' 

Since g(9) is absolutely uniformly bounded, the first term above is bounded by 
0(n~ 2 ). The second term is 0{n~ 2 ) obviously. This shows that the error term in 
(p]) is 0{n- 2 ). 
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As regards A n given by ([4]), one can similarly obtain 

z 1 C°° x 

A n =P{6 > 0, VT n < — = — / F n [* Q - 2/(0)i] fl (-=)da! 
, 32 s 2/(0) VnJo Vn 

ai 02,03. r> , _2s 

= ^H h — + 0(n ), 

V n n 

where y* is between and (z Q -y)/[2/(0)\AT], fii = [ff(O)/(2/(O))][(l-a)z a +0(,z a )], 
a 2 = [,9'(0)/(8/ 2 (0))][(l - + 1) + z a cp{z a )] + [ 5 (0)/(2/(0))]{/ 11 [(l - a) - 

z a <t>{z a )] + /i 2 (l - a 3 = [,9"(0)/(48/ 3 (0))][(z 2 + 2 )0(z Q ) + (1 - a)(z 3 a + 3z a )] + 
[ ff '(0)/(4/ 2 (0))]{/n[(l - a)z a + 2<b(z a )] + / 12 [(1 - a)z a + 0(* a )]} - [ff(0)/(2/(0))] x 
{/ 2 i[(4 + 4^ + 8)^(« a )] + fa[{zl + 2)cf>{z a )] + / 23 0(^)}. The error term in 02) 
is still 0(n -2 ) and this proof is omitted. 

Therefore, we have the the expansions of B n given by §5§ B n = \—b\l%Jn—b 2 jn— 
b 3 /n 3 ^ 2 + 0(n~ 2 ) where A = g(6)d6 as before, bi = ai - ax = z a g(0)/[2f(0)], 
b 2 = ~a 2 -a 2 = 5 '(0)(z 2 + l)/[8/ 2 (0)] + g(0)(fn + /i 2 )/[2/(0)], b 3 = H 3 - a 3 = 
9 "{0){zl + 3z q )/[48/ 3 (0)] + z q5 '(0)(/h + /i 2 )/[4/ 2 (0)]. Substituting a u a 2 , a 3 , a u 
0,2, a 3 , b\, b 2 and 63 into (JT]), we get the expansions of 5 n and e n for the general 
location parameter case given by (fTJ) and ([2]). 



4-2. Testing with mean vs. testing with median 



Suppose Xi, . . . ,X n are i.i.d. observations from a N(0, 1) density and the statis- 
tician tests Hq : 9 < vs. Hi : 9 > by using either the sample mean X or 
the median T n . It is natural to ask the choice of which statistic makes him more 
vulnerable to false discoveries. We can look at both false discovery rates 5 n and e n 
to make this comparison, but we will do so only for the type I error rate 5 n here. 

We assume for algebraic simplicity that g is symmetric, and so g'(0) — and 
A = 1/2. Also, to keep track of the two statistics, we will denote the coefficients 
ci, c 2 by c-y x, c\,t„ > c 2 x an d c 2 ,t„ respectively. Then from our expansions in section 
3.1 and section 4.1, it follows that 

c i,t„ - c x x = 9(0)(<f>(z a ) - az a )(V2ir - 2) = a(say), 

and 

c 2,t„ - <h,x =5 2 (O)za(0(^ Q ) - az a )(2ir - 4) - g(0)V2wfi 2 a 

>g 2 (0)z a ((j)(z a ) - az a )(2n - 4) = 6(say) as f u < 0. 



Hence, there exist positive constants a, b such that liminfn^oo ^Jn{^/n{5 ny T rl — 
$n x) ~~ a ) — i- e -, the statistician is more vulnerable to a type I false discovery by 
using the sample median as his test statistic. Now, of course, as a point estimator, 
T n is less efficient than the mean X in the normal case. Thus, the statistician is 
more vulnerable to a false discovery if he uses the less efficient point estimator as 
his test statistic. We find this neat connection between efficiency in estimation and 
false discovery rates in testing to be interesting. Of course, similar connections are 
well known in the literature on Pitman efficiencies of tests; see, e.g., van der Vaart 
(0,p. 201). 
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4-3. Examples 

In this subsection, we are going to study the exact values and the expansions for 
S n and e n in two examples. One example is f(x) = 4>{x) and g(9) — </>(#); for the 
other example, / and g are both densities of the standard Cauchy. We will refer 
to them as normal-normal and Cauchy-Cauchy for convenience of reference. The 
purpose of the first example is comparison with the normal-normal case when the 
test statistic was the sample mean (Example 2 in Section 3); the second example 
is an independent natural example. 

For exact numerical evaluation of S n and e„, the following formulae are necessary. 
The pdf of the standardized median 2/(0)v / n(T„ — 9) is 

(33) Ut) = 2/ ( 0) Ky^^^y^^ 1 ~ F( 2f(0W^ )r ~ §l 
We are now ready to present our examples. 

Example 3. Suppose Xi, X 2 , ■ . ■ , X n are i.i.d. N(9,l) and g{6) — (f)(9). Then, 
g(0) = /(0) = S'(0) = f(0) = and g"(0) = /"(0) = -1/^F. Then, 

we have f n = 0, f 12 = -(1 - 2{f }), h\ = 0, / 22 = 1/4 - tt/12 and / 23 = 
1/4 - (1/2 - {f }) 2 . Plugging these values for f n , / 12 , / 21 , / 22 , / 23 into 0, 
pip and ([7]), we obtain the expansions for 5 n , and similarly for e„ in the normal- 
normal case. 

Next we consider the Cauchy-Cauchy case, i.e., X%, . . . , X n are i.i.d. with density 
function f(x) = 1/{tt[1 + (x - 6>) 2 ]} and g{6) = 1/[tt(1 + 6» 2 )]. Then, /(O) = 1/tt, 
/'(O) = and /"(O) = -2/tt. Therefore, / n = 0, / 12 = -(1 - 2{f }), / : 



/21 



0, / 22 = 1/4 - 7r 2 /12, and f 23 = 1/4 - (1/2 - {f }) 2 . Plugging these values for 
/lii /i2, /21, /22, /23 in (129p . (|5^1) . PT|) . we obtain the expansions for S n , and similarly 
for e n in the Cauchy-Cauchy case. 

The true and estimated values of S n for selected n are given in Figure [4] and 
Figured! As before, the true values of 8 n and e are computed by taking an average 
of the lower and the upper Riemann sums in A n , A n , B n and B n with the exact 
formulae for /„ as in (f3"3"| . It can be seen that the two values are almost identical 
when n = 30. By comparison with Figure [TJ we see that the expansion for the 
median is not as precise as the expansion for the sample mean. 

The most important thing we learn is how small 8 n is for very moderate values 
of n. For example, in Figure 01 8 n is only about 0.01 if a = 0.05, when n = 20. 
Again we see that even though we have changed the test statistic to the median, 
the frequentist's false discovery rate is very small and, in particular, smaller than 
a. More about this is said in Sections 4.4 and 4.5. 



4-4- Spiky priors and false discovery rates 

We commented in Section 4.1 that if the prior density g(9o) is large, it increases 
the leading term in the expansion for 5 n (and also e„) and so it can be expected 
that spiky priors cause higher false discovery rates. In this section, we address the 
effect of spiky and flat priors a little more formally. 

Consider the general testing problem Hq : 9 < 9q vs Hi : 6 > 9q, where the 
natural parameter space = (9_,9). 

Suppose the a (0 < a < 1) level test rejects Ho if T n G C, where T n is the test 
statistic. Let P n {9) — Pe(T n € C). Let g(9) be any fixed density function for 9 and 
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Fig 5. True and estimated values of 8 n when the test statistic is the median for the Cauchy- 
Cauchy case. 
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let g T (8) = g(9/r)/T, r > 0. Then g T {9) is spiky at for small r and g T (8) is flat 
for large r. When 9 — 0, under the prior g T {9), 

, s s Je/ T p n(Ty)g(y)dy 

(34) S n (r) = P{9 < 0|T n e C) = - 



fJll P„{Ty)g{y)dy 



and 



J°[l-P n (Ty)]g(y)dy 

(35) e n (r) = P{8 > T n # C) = % . 

fej T T [l-Pn{ry)]g{y)dy 

Let as before A = /_ g(0)d0, the numerator and denominator of (fM)) be denoted 
by A n (r) and B„(t) and the numerator and denominator of (I35|) be denoted by 
A n (r) and B n (r). Then, we have the following results. 

Proposition 1. If P~(9o) = lime_e _ P n (8) and P+(#o) = lim0-i.0o+ P n (9) both 
exist and are positive, then 

(36) lim5 n (r)- A7 '" 



*o ^ ' AP„-(0) + (l-A)P+(0) 
and 

(l-A)[l-P+(0)] 



(37) lim e„(r) = 



o " w A[l-P„-(0)] + (l-A)[l-P+(0)] 

Proof. Because < P n (ry) < 1 for all y, by simply applying the Lebesgue Domi- 
nated Convergence Theorem, lim T ^ A n (r) = AP~(0), lim T ^ -Bn(T) = AP~(0) + 
(l-A)P+(0), lim T ^ in(r) = (l-A)[l-P+(0)] and lim r ^ S„(r) = A[1-P"(0)] + 
(1 - A)[l - P+(0)]. Substituting in (J3U) and 435]), we get and (071). □ 

Corollary 1. If < A < 1, lim T ^oo Pn(Ty) — for all y < 0, lim T ^oo P n {ry) = 1 
/or a/Z y > 0, i/ien lim T ^oo 8 n {r) — linv^oo e n (r) = 0. 



Proof. Immediate from (|36|) and (|37j) . □ 

It can be seen that P r ^(0) = P+ (0) in most testing problems when the test 
statistic T n has a continuous power function. It is true for all the problems we 
discussed in Sections 3 and 4. If moreover g(9) > for all 9, then < A < 1. 
As a consequence, lim T _>o S n (r) = A, lim T ^o £«,(■?") = 1 — A, and limi-^oo^^r) = 
lim-r^oo en(r) = 0. If 9 is a location parameter, 9a = and g(#) is symmetric about 
0, then lim T ^ S n (r) = lim T ^ £«(!") = 1/2. 

In other words, the false discovery rates are very small for any n for flat priors 
and roughly 50% for any n for very spiky symmetric priors. This is a qualitatively 
informative observation. 



4-5. Pre- experimental promise and post- experimental honesty 

We noticed in our example in Section 4.4 that for quite small values of n, the 
post-experimental error rate S n was smaller than the pre-experimental assurance, 
namely a. For any given prior g, this is true for all large n; but clearly we cannot 
achieve this uniformly over all g, or even large classes of g. In order to remain 
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Fig 6. Plots of n a (r) as functions of t for normal-normal test by mean and Cauchy- Cauchy test 
by median for selected a. 



honest, it seems reasonable to demand of a frequentist that 5 n be smaller than 
a. The question is, typically for what sample sizes can the frequentist assert his 
honesty. 

Let us then consider the prior g T {6) — g(9/r)/T with fixed g 7 and consider the 
minimum value of the sample size n, denoted by n a (r) 1 such that 5 n < a. It can be 
seen from that n a {r) goes to oo as t goes to 0. This of course was anticipated. 
What happens when r varies from small to large values? 

Plots of n a (T) as functions of t when the population CDF is Fg(x) — $(x — 9), 
g(9) = (f>(9) and the test statistic is X are given in the left window of Figure [6] It is 
seen in the plot that n a (r) is non-increasing in t for the selected a- values 0.05 and 
0.01. Plots of n Q (r) when F e {x) = C(x - 9) and g{9) = c(ff), where C(-) and c(-) 
are standard Cauchy CDF and PDF respectively, are given in the right window of 
Figure El 

In both examples, a modest sample size of n — 15 suffices for ensuring r5 n < a if 
r > 1. For somewhat more spiky priors with r « 0.5, in the Cauchy-Cauchy case, 
a sample of size just below 30 will be required. In the normal-normal case, even 
n = 8 still suffices. 

The general conclusion is that unless the prior is very spiky, a sample of size 
about 30 ought to ensure that 5 n < a for traditional values of a. 

Appendix: Detailed expansions for the exponential family 

We now provide the details for the expansions of A n fi ln in (fT3]) and A n ,g 2n in (fTSf 
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and we also prove that R n g in (fT4|) and R n ,e 2n m <fT6|> are smaller order terms. 

Suppose g{9) is a three times differentiable proper prior for 9. The expansions are 
considered for those 9$ so that the exponential family density has a positive variance 
at 6*o- Then, we can find two values 9\ and 9 2 such that 9 < 9\ < 9q < 9 2 < 9 
and the minimum value of cr 2 {9) is positive when 9\ < 9 < 9 2 . That is if we let 
mo = ming 1< g < g 2 a 2 (9), then m > 0. Since cr 2 (9), ki(9), Pi{9) and g^{9) are all 
continuous in 9, each of them is uniformly bounded in absolute value for 9 € 9 2 \. 
We denote Mo as the common upper bound of the absolute values of <r 2 (9), Ki(9) 
(i = 3,4,5), Pl {9) (i = 3,4,5), g(6), g'(9), g"{9) and g^(9). 

In the rest of this section, we denote 9\ n = 9q + (9\ — 9o)/n 1 ^ 3 , 9m = #0 + (#2 — 
#o)/™ 1/3 , xx = oovM^i _ ^0) - Za, %2 = <y n ^/n(9 2 - 9 ) - z a , xi n = a Vn(9 ln - 
9o) — z a and x 2n = <J ^n(9i n — 9o) — z a . As in (fT3|) . (fT4)l . (fT5|) and (fT6|) . we define 
A n ,e ln = P(fli„ < f < flo,^ e C), R n£ln = A„ - A nfilnl A n ,e 2n = P{9 < 9 < 
@2n,X £ C) and R n ,e 2n = An — A n j 2n , where A n and A n are given by © and Q 
respectively. We write B n>6l = P{9 > 9 lni X € C) and B n>&2 = P{9 < 9 2n , X C). 
Then, one can also see that R n g = B n — B n ,e 1 and R n $ 2n = B n — B n j 2 from 
definition, where B n and B n are given by ([5]) and ^ respectively. 

The following Proposition and Corollary claim that R n g and R n ,e 2n are the 
smaller order terms. Therefore, the coefficients of the expansions of A n and A n are 
exactly the same as those of A n ^g ln and A n fi 2n . 

Proposition 2. Let9 x , r ,n = #o + (#i - #o)/ri r and9 2 , T ,n = 9 a + {9 2 -9 )/n T . If0< 
t < 1/2, then for any I < oo, lim n _>oc n l n (9i >Tin ) = ]im n -> 00 n l [l- f3 n (9 2lT ,n)] = 0. 

Proof. A proof of this can be obtained by simply using Markov's inequality. We 
omit it. □ 

Corollary 2. For any I > 0, lim„^oo n l R n Sl i = lim^oo n l R n £ 2n = 0. 
Proof. Since f3 n {9) is nondecreasing in 9, we have 

n l R n g ln = n l [ " n (9)g(9)d9 < n l /3„(0 M/3 .„) / " g(9)d9 < n l p n {9 1A/ ^ n ) 

and similarly n l R n ,o 2n < n l [l — /3„ ($2,1/3,™)] • The conclusion is drawn by taking 
t = 1/3 in Proposition [2j □ 

In the rest of this section, we will only derive the expansion of A n ^ ln in detail 
since the expansion of A n ^ 2n is obtained exactly similarly. 

Using the transformation x — aoy/n(9 — 9q) — z a in the following integral, we 
have 

(38) A n ,e ln = -±= { ^ p n (9 + ^-^)g(9 + ^^)dx. 

&oV n Jx ln o-Q^/n oa^/n 



Note that 



x + z 



x - fi(9 + 2±ao 



where 



(40) 



,x,n 



.Mo - K9o + ^ 



(TO 



r ft-0 o ,n 



V ^0 V n 



208 



A. DasGupta and T. Zhang 



We obtain the coefficients of the expansions of A n _g ln in the following steps: 



1. The expansion of g(8o + ~^=) for any fixed x £ [xi n ,—z a ] is obtained by 
using Taylor expansions. 

2. The expansion of ko 0}X ,n for any fixed x € [xi n ,—z a ] is obtained by jointly 
considering the Cornish-Fisher expansion of ke , n , the Taylor expansion of 



\/n[po — /Lt(#o + f^7n)]/ <7 o and t ne Taylor expansion of <jo/<j{ 



X-\-Zg 

<7 OS /n 



3. Write the CDF of X in the form of Pe[Vn X ~^ ) 9) < u]. Formally substitute 

in the Edgeworth expansion of the CDF of 
is obtained by combining it with Taylor 



On 



x-\-z 



and u 



X. An expansion of n { 



x+z a 



expansions for a number of relevant functions (see (|47|) ). 

4. The expansion of A nt g ln is obtained by considering the product of the expan- 
sions of g(9 + and f3 n (0o + f^f) under the integral sign. 

5. Finally prove that all the error terms in Steps 1, 2, 3 and 4 are smaller order 
terms . 



We give the expansions in steps 1, 2, 3 and 4 in detail. For the error term study 
in step 5, we omit the details due to the considerably tedious algebra. 

Step 1: The expansion of g(6o 
expansion: 



(41) g(d 

where r a , x , n 



x + z a 



&oV n 
is the error term. 



g(Oo)+g'{0o) 



) is easily obtained by using a Taylor 
g"(9 Q ) (x + z a f , 



Step 2: The Cornish-Fisher expansion of kg , n ([l[, p. 117) is given by 

(z% - 3z a )p 40 {2zl - 5z a )p] 



(42) kg on = z a H — -= h - 

by/n n 



°30 



24 



3G 



+ n,n, 



where r% >n is the error term. 

The Taylor expansion of the first term inside the bracket of (|4T))) is 



(43) 



(X + Za) 



p 3 o(x + z a ) 2 p 40 (x + z a ) 3 



r 2,: 



2y/n 6n 
and the Taylor expansion of the term outside of the bracket of l|4"0|) is 



(44) 



1 



P 30 {X + Zq) 

2y^ 



3p: 



P40 

4 



(X + Z a ) + 



where r 2<X:n and rs )Xi n are error terms. 

Plugging [|4"2"|1. (g3J) and (JSJ into ([30]). we get the expansion of fce 0):c ,„ 

1 1 

(45) &e ,x,n = _a 



below: 



-7=h{n) + -fz(x) + r 4 ,x,n, 

Jn n 



where r 4:Xi „ is the error term, fi(x) = fax + f w and f 2 (x) = f 23 x 3 + /22a; 2 + 
/21a; + /20, and the coefficients for fi(x) and f 2 (x) are /10 = — [2z^ + l)p 30 /6, 



z a p3o/2, /20 = (4 + 2^a)plo/9-(4 + ^)p4o/8, /21 = (7^/24+1/12)^ 



/11 

^aP4o/4, / 2 2 = 0, f 23 = p m /12 - pl /8. 



30 ~ 
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Step 3: The Edgeworth expansion of the CDF of X is (Barndorff-Nielsen and 
Cox ([1], p. 91) and Hall (Qj]|, p. 45)) given below: 



(46) 



X - h(8q 



x~\-z. 



< U 



x-\-z a 



CToVn' 



$(lt) 



4>(u) (u 2 - 1) 



x p 4 (6> + 



>n 6 

£C I N 



p 3 (6>o + 



") 



(u 3 - 3u) 



(w 5 - 10u 3 + 15m) 



72 



PW + 



24 



where r 5 „ is an error term. If we take y, = k@ 0tXtn in (146|) . then the left side is 
l-Pn(0 o + z±t) and so 



(47) 



<f>{k 



24 



p 4 (6 | o + 



6 



Ps(0o + 



, a: , n 



^Qkg ,x,n 



72 



15A;e x n ) „ 

P3^0 + 



)]-rs, 



Plug the Taylor expansion of ps(0o : 

I 



(48) 



P3(^0 



P30 



(a; + z Q ) 



P40 



P30 I + r 6,K,n 



in (|47|) . where rgxn is an error term, and then consider the Taylor expansions of 



the three terms related to 



in ([47]) and also use the expansion (|45|) . On quite 



a bit of calculations, we obtain the following expansion: 



A, (00 



(49) 



x + z 



) = $(x) - <t>{x) 



h{x) , / 2 (x) 



<X*)(z 2 



6^ 



30 



(x + z a ) 



(p 



40 



0(x) r (x 3 -3x) 



0* 



7) 

5 - 10a; 3 



72 







— X(j)(x) 







15x) 2 

P30J "r r 7,x,n 



: $(x) + Z^-L gi {x) + ^- L g 2 {x) + r 7 ,x,n, 



where r7, x , n is an error term, gi(x) = gxix 1 +511X+310, 52(2;) = 320+52ix+522X 2 + 
523X 3 + 324a; 4 + 525X 5 , and the coefficients of 0i(x) and 32 (x) are 312 = P30/6, 
.911 = z a p 3Q /2, gio = 2qP3o/3, .925 = pjo/72, .924 = -z Q p§ /12, g 2 3 = P40/8 - 
13^pio/72-7p 2 /24,. 9 22 = z aPi0 /6-zlpl /6-z a pi /12, g 21 = (z 2 /4-7/24)p 4 o- 
4p§o/18 - 13z 2 p1o/72 + 4/4/9, .920 = (4/8 - z Q /24)p 40 - (4/9 - W36)p 2 



30- 



Step 4: The expansion of A n 



is obtained by plugging the expansions of 



f3(8 + f^=) and g(9 + f^=)- On careful calculations, 



(50) 



£2 



Q3 
„3/2 



' r 8,-, 
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where r 8i „ is an error term, a\ = (g(9 )/a n )[(j)(z a ) — az a ], a 2 = Pmg{9a)[a + 
2azl - 2z a (t>(z a )]/(6a ) ~ g'{S )[a{z 2 a + 1) - z a (t>(z a )]/(2a%), and a 3 = [hu<j)(z a ) + 
ah 12 ][(g , '(9 )/{6a 3 )} + [h 2 i<t>{z a ) + ah 22 ][g' (9 Q ) / a%] + [h 3 i(/>(z a ) + ah 32 }[g(6o)/a }, 
where h n = z 2 a + 2, h l2 = -{z% + 3z Q ), h 2l = -(p 30 /3)(zl + l), h 22 = {p 30 /3)(z 3 x + 
2z a ), h 31 = -4pio/36 + 4^ p 2 o /9 + p 2 o/36 _ 5z 2 p4o/24 + p40 /24, /i 32 = -5z 3 p| / 

18 — llz Q /93 /36 + 2:^40/8 + z a p4o/8. These a\, a 2 and a 3 are the coefficients in 
the expansion of (|23p. 

The computation of the coefficients of the expansions of A n _g ln is now complete. 
The rest of the work is to prove that all the error terms are smaller order terms. But 
first we give the results for the expansion of A n fi 2n . The details for the expansions 
of A n ^ 2n are omitted. 

Expansion of A n ,g 2n : The expansion of A n ,g 2n can be obtained similarly by 
simply repeating all the steps for A n> g ln . The results are given below: 

(51) A nfi2n = — + — + + rg tn , 



in n 



where rg jn is an error term, a\ = g(8o)[(j)(z a ) + (1 — a)z a ]/cro, a 2 — g'(0o)[(l — 
a)(z 2 a + 1) + z a (t>{za)}/ (2c 2 ) — p 3 og(6o)[(l — a)/6 + (1 - a)z 2 /3 + z a <f>(z a )/3]/cro, 
anda 3 = (5"(^o)/6^)[ftii0(^)-(l-a)/ii2]-(5'(^o)/^)[-/i2i^(2a) + (l-a)/i 2 2]- 
{g(0o) / eo)[-h 3 i4>(z a ) + (1 - a)h 32 ], where hn, h 12 , h 2 i, h 22 , h 31 and h 32 are the 
same as defined in Step 3. These 01, a 2 and a 3 are the coefficients in the expansion 
of (S3}. 

Remark. The coefficients of expansions of S n and e„ are obtained by simply using 
formula |J7| with a\, a 2 and a 3 in ([23]) and also the coefficient a%, a 2 and a 3 in ([24]) 
respectively. 

Step 5: (Error term study in the expansions of A ni $ ln ). We only give the main 
steps because the details are too long. Recall from equation (f38|) that the range 
of integration corresponding to A n< g Xn is x\ n < x < — z Q . In this case, we have 
linin^co a;i„ = 00 and lim„_ +00 x\ n / ^/n — —z a . This fact is used when we prove the 
error term is still a smaller order term when we move it out of the integral sign. 
(I) In (|4Tj) , since g^ 3 \9) is uniformly bounded in absolute values, r gvX . n is abso- 
lutely bounded by a constant times n~ 3 l 2 {x + z a ) 2 
(II) From Barndorff- Nielsen and Cox |4.5l pp 117], the error term ri,„ in (|4"2")l is 
absolutely uniformly bounded by a constant times n~ 3 / 2 . 

(III) In (|43p and (f4~4"l) . since Pi(6?) and (i = 3,4,5) are uniformly bounded 
in absolute values, the error term r 2 ^ x ^ n is absolutely bounded by a constant 
times n~ 3 l 2 {x + z a ) 4 and the error term r 3tXtTl is absolutely bounded by a 
constant times n~ 3 / 2 (x + z a ) 3 . 

(IV) The exact form of the error term r^ Xtn in (|4"5|) can be derived by consider- 
ing the higher order terms and their products in (|42[) . (|4"3"| and (j4~4")) for the 
derivation of expression (|45[) . The computation is complicated but straight- 
forward. However, still, since pi(9) and Ki(0) (i — 3,4,5) are uniformly 
bounded in absolute values, r^ x ^ n is absolutely bounded by n~ 3 / 2 Pi(\x\), 
where Pi(|x|) is a seventh degree polynomial and its coefficients do not de- 
pend on n. 

(V) Again, from Barndorff-Nielsen and Cox ([l[, p. 91), the error term rs in in 

P5|) is absolutely bounded by a constant times n~ 3 l 2 . 
(VI) The error term re. x . n in ([48]) is absolutely bounded by a constant times 
n (x + Za) 2 since p%(&) and Ki(6) (i = 3,4,5) are uniformly bounded in 
absolute values. 
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(VII) This is the critical step for the error term study since we need to prove 
that the error term is still a smaller order term when it is moved out 
of the integral in (|50| . We need to study the behaviors of <&(— ke , x ,n) 
and <fi(ke 0tXi n) as n — > oo for all x 6 [xi n , — z a ] uniformly (see (I49|) in 
detail). This also explains why we choose 9\ n = 6 a + (d\ — O^/n 1 / 3 and 
xin = coVn{@in ~ $o) — z a &t the beginning of this section, since in this 
case \kg 0tXtTl + x\ is uniformly bounded by \x\/2 + 1 for a sufficiently large 
n. Then for sufficiently large n, the error term |r7 )X ,n| m (|49p is uniformly 
bounded by |r 7iX ^ n | < (j)(x/2 + 1)P 2 (M) where -P2(|^|) is a twelveth degree 
polynomial of \x\ and its coefficients do not depend on n. 
(VIII) Finally, we can show that the error term rs, n in ([50]) in 0(n~ 2 ). This is 
tedious but straightforward. It is proven by considering each of the ten 
terms in rs in separately. 

Remark. We can similarly prove that the error term rg^ n in (|5ip corresponding to 
An,e 2n is 0(n~ 2 ). Since the steps are very similar, we do not mention them. 
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